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Abstract 

We propose a simulated annealing algorithm (called SNICA for "stochas- 
tic non-negative independent component analysis") for blind decomposi- 
tion of linear mixtures of non-negative sources with non-negative coeffi- 
cients. The de-mixing is based on a Metropolis type Monte Carlo search 
for least dependent components, with the mutual information between re- 
covered components as a cost function and their non-negativity as a hard 
constraint. Elementary moves are shears in two-dimensional subspaces 
and rotations in three-dimensional subspaces. The algorithm is geared 
at decomposing signals whose probability densities peak at zero, the case 
typical in analytical spectroscopy and multivariate curve resolution. The 
decomposition performance on large samples of synthetic mixtures and ex- 
perimental data is much better than that of traditional blind source sepa- 
ration methods based on principal component analysis (MILCA, FastICA, 
RADICAL) and chemometrics techniques (SIMPLISMA, ALS, BTEM). 



1 Introduction 

Decomposing linear mixtures (superpositions) into their components is a prob- 
lem occurring in many different branches of science, such as telecommunications, 
seismology, image processing, and biomedical signal analysis {e.g., EEG, EGG, 
fMRI) [T]. Blind source separation (BSS), in particular, deals with the case 
where neither the sources, nor the mixing matrix are known, the only available 
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data being the mixed signals The standard approach to BSS is indepen- 
dent component analysis (ICA) 13 El E] ■ In IC A one typically first applies a 
principle component transformation to "prewhiten" the data, i.e. to transform 
the covariance matrix into a unit matrix, and then uses some statistics going 
beyond second order {e.g. mutual information or temporal correlations) to min- 
imize interdependencies between the reconstructed sources by a final rotation. 

A similar problem arises in analytical spectroscopy, where one wants to de- 
compose spectra of chemical mixtures into the contributions of individual com- 
ponents in order to identify the emitting (or absorbing) mixture constituents 
and quantify their abundances. Mixture decomposition problems of this sort 
are classified in analytical chemistry as quantitative analysis of "black" multi- 
component systems 0. Compared to other applications of BSS, here we have 
two additional problems: 

• Spectra of different chemically pure substances often show very large over- 
laps. This is in conflict with the central assumption of ICA that the pure 
sources (mixture components) are statistically independent. One can relax 
this assumption partially (see, e.g., Refs. 012]), although one can hardly 
do without any assumption of this type. But, because of overlaps between 
spectra of similar compounds, this becomes a serious obstacle in spectral 
mixture decomposition. 

• In a vast majority of spectroscopic techniques, spectra are non-negative, as 
are also the coefficients of the mixing matrix (concentrations) . This gives 
rise to constraints which are not easily taken into account in typical ICA 
algorithms that heavily rely on linear algebra [HI E] . This would not be 
problematic by itself - in other applications signals are often non- negative, 
too. But it does pose a very serious problem, if, as is usually the case, 
there are spectral regions where some of the sources have zero (or very 
small) intensity ^Hl- In this case, prewhitening will in general produce 
spectra violating the non-negativity constraints, and restoring them later 
will be far from trivial. 

For these reasons, this "Multivariate Spectral Curve Resolution" (MCR) 
problem has led to a variety of chemometrics methods (for recent reviews 
see Refs. ^JEl^^ElE])- Generally, these algorithms resemble ICA in that 
they perform blind recovery of spectra and concentrations from the spectra of 
mixtures only, using no or little a priori information about mixture composition 
and/or about pure spectra. In view of the problems mentioned above, some of 
these MCR methods are less ambitious than ICA and estimate only feasible 
ranges for spectra and concentrations (see, for instance, a recent Monte Carlo 
approach ^21 and references therein). Also, the MCR analysis is often facili- 
tated by including some additional information such as unimodality and closure 
(or mass balance) ^]. The effect of overlap between spectra can be reduced by 
estimating their dependencies not from the spectra S{f) directly, but from their 
second derivatives d'^S{f)/dp [Z1E|. Another approach developed in chemo- 
metrics is based on the identification of "pure variables" |19| for all mixture 
components. A pure variable is a frequency (wavelength) at which only one 
of the components contributes. The pure variables, thus, approximately mark 
the regions where at least one of the spectral components is guaranteed to be 



2 



independent from all others. This idea is central to several advanced chemomet- 
rics algorithms, e.g., KSFA [201, SIMPLISMA 51, IPCA [22] and SMAC [231 . 
Also, Band- Target Entropy Minimization (BTEM) has been recently proposed 
which involves an explicit (made by visual inspection) choice of spectral features 
(target regions) to be retained in the course of constrained optimization |24j . 
While efficient and highly flexible, supervised band selection has the drawbacks 
of being neither fully automated, nor completely blind. 

In order to cope with the non-negativity problem, several methods have been 
proposed. In Refs. ^3123 ESI i non-negativity is incorporated in an approxi- 
mate way already in the the basic decomposition algorithm. For a discussion see 
Ref. |2ZI- While the above are general purpose ESS methods, there exist more 
specialized chemometrics (MCR) algorithms designed to use non-negativity (see, 
e.g., [121 1121 dllSl im El Ell)- In many of them, some variant of principle 
component analysis is used to first prewhiten the data, i.e. to minimize the 
off-diagonal elements of the covariance matrix. In a class of algorithms known 
as MCR-ALS, prewhitening and the interdependence minimization are done as 
in standard ICA, but non-negativity is then enforced in a postprocessing step 
using the alternating least squares (ALS) technique jHEniiini- In general, this 
gives better results than the algorithms without ALS. But, since the interdepen- 
dencies between the sources are not checked during the ALS step, one runs the 
risk of worsening them. In addition to that, applying post facto corrections to 
restore essential properties which were present in original data, but which were 
ruined by the main part of an algorithm, appears rather awkward conceptually. 

Here we argue that a more straightforward strategy is not only possible, but 
actually leads to de-mixing algorithms with notably improved performance. Our 
approach is conceptually very simple and is based on mixture decomposition into 
strictly non-negative least dependent components. Unlike other ICA methods, 
it does not resort to prewhitening. Instead, we split the linear de-mixing trans- 
formation into small random steps such that (i) on each step the non-negativity 
is preserved exactly, and (ii) the dependencies between the components tend 
to be reduced (the mutual information (MI) is minimized non-greedily) . Since 
greedy minimization would lead to trapping in local minima of MI, we use a 
Metropolis-Hastings Monte Carlo strategy (3T1 1221 ■ The algorithm (which we 
term SNICA, for Stochastic Non-negative Independent Component Analysis) is 
explained in the next section. The numerical results presented here show that 
our new method is more efficient than previous MCR algorithms, including ICA 
and ICA- ALS type methods |Z| based on the same MI estimator. 

2 Method 

As in most ICA and MCR approaches, we start out with the linear mixture 
model 

X = AS, (1) 

where X is the matrix of mixed signals (observed spectra), S are pure sources 
(spectra of individual mixture components), and A is the matrix of their con- 
centrations (mixing matrix). We assume that A is a square K x K matrix, 
i.e. the number of sources K is the same as the number of mixtures. X and S 
are of size K x N , where the number N of frequencies in the spectra is much 
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larger than K. The task is to estimate bhndly S and A, given only X. In 
our current setting the pure spectra and concentrations are assumed to be non- 
negative and we seek a solution (an estimate for S) in the form of statistically 
least dependent components Y — WX, with being an estimate for A up 

to scaling and permutation of components (ambiguities inherent to the linear 
mixture model). True and estimated mixing matrices can be compared using 
the Amari index P(W~^, A) which is a good measure of decomposition quality 
[3 El 13 • The Amari index vanishes when the recovered concentrations differ 
from the true ones only in scaling and permutation of components, and it in- 
creases as the quality of decomposition becomes poor. Thus, small ^ values of 
the Amari index are desirable. 

We assume that each column vector y.;, i = 1, 2, . . . , iV is a realization of 
the same vector- valued random variable Y — (Yi, 12, • ■ • , i^^:), and we look 
for solutions where its K components (i.e., the estimated individual spectra) 
have least mutual information. For a multivariate continuous random variable 
with given marginal and joint densities fijiuj) and ju(t/i, 2/2, Vk), the mutual 
information is defined as 

K 

/( Yi , 1^2 , . . . , Ik) = ^ i/(r, ) - (Fi , ^2 , . . . , >k) , (2) 

where 

H{Y^) = - j In^ijdyj (3) 

H{Yi,Y2,...,Yk) = - J tnn^idyidy2...dyK (4) 

are the differential entropies j3H| . Mutual information is zero if and only if the Yj 
are strictly independent {i.e., their joint density factorizes, ^ — Y[j A'j), and it is 
positive otherwise. Notice that the MI is sensitive to all types of dependencies, 
not only linear correlations. 

To assess the dependencies between the estimated spectra (given by rows of 
Y) we use the precise MI estimator based on nearest neighbor statistics, denoted 
here as /(Y), developed in Ref. Pl). It is derived from the Kozachenko-Lconenko 
estimate for Shannon entropy 



and 



D ^ 



H{Y) ^-^{k) + ^{N)+ log CI3 + ^ ^ log e(z) (5) 



1=1 



where ■(/'(•) is the digamma function, e(i) is twice the distance from the point 
Yi to its fc-th neighbor, D is the dimension of Y, cd is the volume of the 
ZJ-dimensional unit ball, and N is the length of sample. Mutual information 
could be computed by estimating the entropies © , Q separately, with the same 
value of k, and using Eq. jSJ. But this would involve very different scales in 
the marginal and joint spaces, leading to different biases. Looking for nearest 
neighbors on the same scale in the joint (Yi, Y2, Yk) and marginal Yj spaces 
significantly increases the chance that the biases in H{Yj) and H{Yi, Y2, Yk) 

^In practice, we find that good decomposition quality roughly corresponds to Amari indices 
P < 0.05, whereas P > 0.2 generally characterizes unacceptably poor performance. 
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cancel. This reduces the bias of the MI estimate. Since Eq. ||SJ) holds for any 
value of k, one can first look for k nearest neighbors in the joint space and then 
use the hyper-rectangles defined by them to calculate the number of neighbors 
in the marginal spaces. The estimate for MI is then: 

IiYuY2, ...,Yk) ^ i'ik)~{K-l)/k-{^P{ni)+i,{n2) + . . .+^inK)) + {K-l)HN). 

(6) 

where nj are the numbers of nearest neighbors in the spaces K, , j = 1, 2, . . . , if ; 
(. . .) denotes the average over all points yi, i = 1,2, . . . , N . Implementation 
details are given in Ref. 34 . 

Statistical errors in this MI estimator decrease with k, while systematic 
errors increase. Empirically, we found that the decomposition performance is 
best for the values of k in the range 10-20, reflecting a balance between these two 
types of errors. Also, spectral curve resolution by ICA was shown to be more 
efficient if performed in the derivative space [3 118j , therefore in all simulations 
discussed below we estimate the MI from the second derivatives of spectra with 
respect to frequency. 

With this numerical measure of dependence, we can formulate the mixture 
decomposition problem as minimization of /(Y) under the linear de-mixing 
transformation Y = WX, subject to the non-negativity constraint ^ yik > 0. 

In contrast to more traditional ICA methods that proceed by splitting the 
de-mixing matrix into a prewhitening transformation (eliminating linear cor- 
relations) and subsequent rotations |Hj , we approach the above constrained 
optimization with a Monte Carlo strategy. To this end, the de-mixing ma- 
trix W is factorizcd into small steps, W = lim W„ with W„ = M„W„_i, 

n — >oo 

where each M„ is chosen randomly and is either a shear transformation T 
in a two-dimensional (2D) subspace, or a rotation R in a 3D subspace (see 
below). The sequence of moves starts with Yq = X, Wo = I (identity ma- 
trix). A move Y„ = M„Y„^i is immediately rejected, if it results in at least 
one negative component yn,ik = (Y„)ifc. Otherwise {i.e., if all yn,ik > 0), 
it is accepted with probability one, if it leads to less dependent components, 
A/ = /(Y„) — /(Y„_i) < 0, while it is accepted with probability exp(— A//T), 
if A/ > 0. Here, T is a fictitious temperature. This is the famous Metropolis- 
Hastings Markov chain Monte Carlo method |31[ I32| . In order to make the 
method more greedy at later stages, when W„ is already close to the optimum, 
we use a simple annealing schedule |3()| where we start with high values of T 
and decrease it successively. Since both R and T matrices act only on low- 
dimensional subspaces (and are the identity in the complements), the difference 
A/ in each step can be computed from information in a 2D or 3D subspace 
only The convergence to the global solution is monitored by computing 
the full dimensional mutual information /(Y„), which guarantees that the least 
dependent, although not necessarily independent, components are reached. 

In our method, the mutual information is used to direct the search in 
the global space, whereas the non-negativity constraint is important in that 
it restricts solutions to physically meaningful subspace. This constraint has 
strongest effect, if large parts of the constraint boundaries yt = 0, k ~ 1,2, . . . ,K, 
are populated by the pure sources, as then any linear transformation is likely 

^Note that the non-negativity of estimated mixing matrix W^^ is not enforced explicitly, 
but it will be nonetheless satisfied to high accuracy. 
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Figure 1: De-mixing in three dimensions by shears and rotations. The scatter 
plots are: pure sources (a); mixture signals (b); components (c) resolved by 2D 
shears (trapped in a local minimum); components (d) recovered through opti- 
mization with 2D shears and 3D rotations about the diagonal. Also shown are 
the values of 3D mutual information (J) and the Amari index (P) of incomplete 
(c) and successful (d) decompositions. 

to violate non-negativity. This happens when each pure spectrum has a high 
chance to be occasionally close to its zero baseline. Such sources are called 
well-grounded and are common in spectroscopy. 

The idea of using 2D shear transformations in tandem with the non-negativity 
constraint was first proposed in the context of "Positive Matrix Factorization" 
[23I3H|. The matrix T('-'^(a) representing a shear y'- = + ayj, y'j = yj in 
the subspace is obtained from identity matrix by adding one off-diagonal 
element, 

4m = [T^'^'^ ia)]im = (5/,„ + aSu6,nj, i^je[l,K]; l,m ^ 1,2, . . . , K. (7) 

While 2D shear transformations are sufficient to decompose two-component 
mixtures, they appear to be too restrictive to sample efficiently higher dimen- 
sional de-mixing matrices. In higher dimensions, the minimization through a 
sequence of shears can easily get trapped in local minima of the Ml landscape. 
One such case is illustrated in Fig. ^ in which a thrcc-componcnt mixture of 
exemplary infrared spectra is considered. First, decomposition was attempted 
using only 2D shears, and Fig. ^ shows the components found by constrained 
minimization. This configuration corresponds to a local minimum of Ml (com- 
pare to Fig. ^ and notice the values of mutual information and very poor 



6 



a 

ij 


- ' jk 


~ ' kk 
~ ' ki 


' ik 







decomposition performance, P = 0.46). Since the components have reached the 
constraint boundaries and a minimum of the cost function has been found to he 
on this boundary, further de-mixing shears can be accepted only by increasing 
/(Y), which can be very slow. However, Fig.nj; suggests that the configuration 
needs to be rotated in such a way that, afterwards, the three principal "beams" 
of points could be brought close to the coordinate axes (and MI be decreased 
further) by subsequent shears. This works indeed, and gives the correct pure 
components with high precision, as seen in Fig. ^i. 

The rotations needed to improve the efhciency are around the space diagonals 
in randomly chosen 3D subspaces. Let us denote by R(*'''^)(a) the matrix which 
performs a rotation by the angle a around the diagonal in the positive octant 
of the subspace spanned by the components k). It has the elements 

: (l + 2cosa)/3, 

1/, N sin a 

: -(l-cosa) + ^, (8) 

1 , , sin a 
-(1 — cosa) 

in the subspace spanned by fc), while it is the unit matrix in the comple- 
ment, r;^*^"* = Sim for l,m ^ fc}. In the results shown below, random 
rotations and shears are used in ratio 1:1. 

Of course, more sophisticated combinations of affine transformations |39l 
I4U| can be designed to sample the de-mixing matrix. This might improve the 
performance of the algorithm further, especially in higher dimensions. But it is 
not clear whether the added complexity would be worthwhile. 

As seen from Eqs. ((Zj),®, the proposed moves T and R are parameterized 
by the "angle" a. For each Monte Carlo step its value is chosen randomly from 
some suitable range a € [— /i, h]. To realize an adaptive step-size control (where 
h is adjusted according to the progress of optimization), we use the following 
update rule 

fihn, if the move M„ is accepted; 
f2hn, if it is rejected, 

with the factors /i > 1 , /2 < 1 to be determined empirically. In our simulations 
we took /i = 1.06, /2 — 0.98 with initial ho = 0.2 — 0.3, although the decom- 
position performance was found to be practically insensitive to precise values of 
ho (see next section). 

For the temperature annealing, we used rather simple schemes with only 2 
or 3 cooling steps, with the temperature in the last step chosen close to zero. 
This is because we found that the detailed annealing schedule had no significant 
effect on the final results. It means, however, that the CPU times needed in the 
following simulations might not be optimal. The initial value of temperature Ti 
was chosen of the order of the mutual information of pure mixture components. 
Choosing Ti too high leads to unfocused search, while choosing it too small 
increases the risk to get trapped in false minima. Both can result in poor 
performance, whereas our empirical choice of Ti ensured efficient decomposition 
in most cases. 

At each temperature, the sequence of de-mixing moves is terminated when 
the mutual information has reached its global minimum, according to some 
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stopping criterion. In our code we kept track of the minimum of /(Y,i) during 
the entire run, 

/(Y„)mi„ - min /(Y„0- (10) 

n' <n 

Minimization is stopped when /(Y„)niin docs not decrease any more during the 
last M Monte Carlo steps, and the de- mixing matrix Wmin that corresponds to 
-^(Y„)min is given as output. The chosen values of M were some compromise 
between the speed and the quality of decomposition. 

With all the above together, the stochastic non-negative de-mixing procedure 
with 2 cooling steps is then as follows: 

1. initialize n = 0, Y„ — X, W„ — I; choose /i„ — hQ,T — Ti, M — Mi] 

2. pick random A: G [1,-ftr], a G [— /i„,/i„]; 

3. if n is even: 

compute shearing matrix M„ = T(*-')(q:) (Eq. |7J); 
else: 

compute rotation matrix M„ = R(*-''^^(a) (Eq. (jHJ); 

4. propose the move Z = M„Y„, V = M„W„; 

5. if 3 < 0: 

reject the move, keep Y„+i = Y„, W„+i = W„ and go to @; 

6. estimate A/ = /(Z) — /(Y„); pick random p E [0, 1]; 

7. if (A/ < 0) or (A/ > and e"^^/^ > p): 

accept the move Y„+i — Z, W„+i — V; 
else: 

reject the move, and keep Y„-|_i = Y„, Wn+i = W„; 

8. if /(Y„_|-i) reaches a new minimum: 

store it together with Ymin = Y„-|_i, Wmin = W„+i, and riynin ~ n + 1; 

9. adjust the step size hn+i according to Eq. Q; 

10. n = n + 1; 

if n < rimin + M: 

go to (2) 
else 

if T^Ta: 

choose T ~ T2 and M = M2, set n — rij^m, and go to (2) 
else: 

output Y,nin and Wmin and stop. 

3 Results and discussion 

This section reports the results of extensive tests of the new algorithm performed 
on a large sample of spectral resolution problems. Here we stick to the strategy 
of statistical validation of decomposition performance developed earlier [51 [7] . 
We stress that performing such tests on a statistically representative set of 
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precisely known mixtures is virtually the only way to examine real efficiency of 
a de- mixing technique. 

To compose a test set of randomized mixtures we first collected a pool of 
99 experimental infrared absorption spectra in the range 550 — 3830 cm^^ (822 
data points each) selected from the NIST database This pool of chemical 
species was designed to contain organic compounds that have common struc- 
tural groups (halogen-, alkyl-, nitro-substituted benzene derivatives, phenols, 
alkanes; alcohols, thiols, amines, esters) as well as a number of "outliers", i.e. 
structures dissimilar from all the rest. This was done to be able to test how 
well the algorithm can treat both - mixtures of highly dependent sources j2] 
and cases in which very deep minima of MI are to be located. After that, 4 
sets with increasing dimensionality if = 3, 4, 7 or 10 (6000 mixtures each) were 
constructed by randomly choosing K normalized pure spectra from the pool 
and applying random square mixing matrices ^ of proper dimension K. Thus, 
our assessment of performance is based on 24000 decompositions in total. 

We examine the performance of the new decomposition technique (SNICA) 
in comparison with the recently developed MILCA algorithm which uses the 
same MI estimator 0. In our previous study I?) MILCA was shown to be very 
efficient when compared to other (MCR) decomposition algorithms on typical 
spectral mixture problems (both synthetic and real). MILCA can be easily 
combined with derivative preprocessing (to deal with overlapping spectra) and 
ALS postprocessing (to improve non- negativity of estimates). Decompositions 
by MILCA were performed in the second derivative space [7], and the postpro- 
cessing ALS iterations were stopped after 1000 steps (we verified that longer 
runs did not result in noticeable improvements). 

Monte Carlo de-mixing was done with a two-step annealing scheme. The 
initial temperatures and stopping criteria were Ti = 0.02, 0.05, 0.5, 1 and Mi — 
1000,1000,2500,8000 (for K = 3,4,7,10, respectively). For the subsequent 
"cooling" stage these were set to T2 = 10~^ in all four cases and M2 = 
500,500,1500,3500. 

For each decomposition the Amari index P(W^^,A) was computed. Fig- 
ure El presents the distributions of the Amari indices with increasing dimension- 
ality K of the mixture problem. Judging from the most probable performance 
(peaks of the P distributions) and the minimal achieved values of P, SNICA 
clearly outperformed the other two algorithms. Also, the Monte Carlo de-mixing 
seems to gain advantage as the number of mixture components K increases. In 
the current implementation (with a very simple annealing scheme), SNICA is of 
order 10 times slower than MILCA and MILCA-I-ALS for K ^ 3, the algorithms 
have comparable speeds a.t K = 7, and SNICA is on average 2 times faster for 
K — 10 (with our stopping criteria). 

One of the mixture sets {K — 3) was also processed by two representative 
ICA algorithms: the widely used FastICA |42| (for references on its chemomet- 
rics applications see Ref. 0) and recently developed RADICAL 03 (both algo- 
rithms rely on decorrelation of data by PCA). The latter is similar to MILCA 
in many respects, notably in speed and performance 6 , but uses a nearest 
neighbour estimator for entropies H(Y) 03], not for MI. We observe that the 
performance of SNICA is superior to these two ICA techniques (Fig. |2t)- In 

^Concentrations aij were generated uniformly randomly in the interval [0, 1]. 
^Second derivative spectra were used. FastICA ran with its non-linearity parameter set to 
"skew" . RADICAL worked without augmentation of data. 
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Figure 2: Statistics of performance of Ml-based algorithms (MILCA, 
MILCA+ALS and SNICA) at decomposing 3-, 4-, 7-, and 10-component mix- 
tures. 
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Figure 3: Performance of SNICA in comparison to traditional ICA (a) and MCR 
(b) algorithms (3- and 10-component mixtures, respectively). 
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Figure 4: 500 SNICA decompositions {K = 3) represented by the grey points 
in coordinates (/(S),P). For 10 selected decompositions the one-standard- 
deviation error bars of estimated /(Y) and Amari indices P were computed 
from 250-500 runs with randomized method parameters in each case (see text). 

particular, a significant fraction (~ 10%) of decompositions by FastICA have 
P > 0.2 (not shown), indicating relatively low performance. 

A similar comparison was made with a state-of-the-art curve resolution al- 
gorithm SIMPLISMA (21 ^ with and without ALS corrections (Fig.Eb). We 
see that ALS significantly improves SIMPLISMA decompositions, but there is 
still a remarkable gap in performance between SIMPLISMA-I-ALS and SNICA. 
Fig. Eb shows the results for a high dimensional test set {K — 10). We should 
mention, however, that the advantage of SNICA over SIMPLISMA-I-ALS is less 
pronounced for smaller values of K, but it is still significant. 

Interestingly, decomposing mixtures of weakly and highly dependent sources 
by SNICA turns out to be almost equally efhcient (Fig. 01 where the mutual 
information scale spans the whole range of /(S) for triples of spectra in our 
test set). This is in contrast to traditional ICA methods whose inaccuracies 
correlate with the dependencies between pure sources, making decompositions 
of overlapping spectra particularly difficult ^ . 

In order to examine the sensitivity of the proposed technique to its pa- 
rameters we gathered the statistics of 10 decompositions with varied annealing 
temperature Ti, initial Monte Carlo step size ho and stopping criteria Mi, M2. 
Each decomposition was attempted with these parameters taken uniformly ran- 
domly from their feasible ranges for K = 3: Ti e [0.01,0.03], ho € [0.1,0.8], 
Ml G [500, 1500], A/2 e [0, 1000] (the "zero" temperature at the second anneal- 
ing step was kept fixed at T2 = 10^^). As reflected by the error bars on Fig. ^ 
the performance of SNICA as measured by the Amari index is rather robust 
with respect to the parameter choices, suggesting that Figs. I2I3I are also robust. 
The deviations from the mean final values of the cost function /(Y) are also 
small for all decompositions indicating that in all cases the global minima were 
reached. We note that these deviations may be simply due to statistical errors 

^We used the SIMPLISMA code given in Ref. 1181 . choosing the "offset" parameter ran- 
domly from the interval [0.01,0.1] which is about 1-10% of the standard deviation of the mixed 
signals. This is consistent with the choice made in Ref. 1241 . 
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SIMPLISMA 
(+ ALS) 


BTEM 


MILCA 
(+ ALS) 


SNICA 


toluene 


0.971(0.973) 


0.954 


0.987(0.994) 


0.973 


n-hexane 


0.994(0.995) 


0.992 


0.990(0.991) 


0.981 


acetone 


0.866(0.899) 


0.886 


0.933(0.943) 


0.972 


aldehyde 


0.943(0.953) 


0.899 


0.901(0.902) 


0.928 


33DMB 


0.576(0.963) 


0.983 


0.964(0.948) 


0.953 


DCM 


0.969(0.967) 


0.904 


0.909(0.966) 


0.964 


average 


0.887(0.958) 


0.936 


0.947(0.957) 


0.962 



Table 1: Decomposition of 6-coniponent experimental mixtures by MCR and 
ICA methods. The entries are normalized inner products between recovered and 
pure spectra of mixture constituents and their average values. Numerical data 
for SIMPLISMA and BTEM are from an independent study (Ref. Table 3 
which gives comparison with some other MCR methods - IPCA j22j, OPA-ALS 

m)- 

in the MI estimation |34| . 

A separate set of high-statistics simulations with SNICA (for K — 3, 4) 
was performed, in which the MI estimator of Ref. was replaced by that of 
Ref. 02] ^ leaving the rest of the SNICA code, all parameters and data un- 
changed. These estimators were found to be comparable in speed and efficiency 
when used in ICA on spectral mixtures. However, while the statistics of the 
Amari index (denoted Pk and Pd) were very similar for 3-component mixtures, 
it seems that the estimator of Ref. |34| performs better in higher dimensions. 
Notably, in 70% of cases the decompositions of four-component mixtures using 
this estimator were better {Pk < Pd), with the average values (Pk) = 0.022 
and (Pd) = 0.035. 

Now we turn to an experimental mixture problem to test SNICA in a real- 
istic application of spectral mixture decomposition. To facilitate comparison, 
the example we chose was the one of Ref. |2^ where the performances of sev- 
eral well-established curve resolution methods were compared. The experimen- 
tal data intended for blind separation consisted of FT-IR spectra (950 — 3200 
cm~^ at 5626 wavelengths) of 14 mixtures of 6 solvents (toluene, n-hexane, 
acetone, 3-phenylpropionaldehyde (aldehyde), 3,3-dimethylbut-l-ene (33DMB), 
and dichloromethane (DCM)). In addition, the pure spectra of mixture con- 
stituents were measured in separate experiments. Thus, this mixture problem 
offered a test case complicated by experimental features typical to analytical 
practice, such as non-linearities, instrumental noise, the presence of spectrome- 
ter background and impurities (residual water and CO2 in this case (24) ] . 

First, we performed decomposition by a PCA-based ICA algorithm (MILCA, 
with and without ALS postprocessing) reducing the dimensionality of the data 
from 14 to 6 on the prewhitening step. The estimated components were then 
compared to the pure sources, computing the normalized inner products between 
them [31^. The results are given in Table which also contains the respective 
values obtained by two MCR methods - SIMPLISMA jU] and BTEM [H]. 
SNICA was applied to this data set with a 3 step annealing schedule taking Ti = 

®We used the implomontation by P. Tichavsky available at 
|http: //www.utla. cas . cz/user_data/scieiitif ic/SI-dept/Tichavsky .html, 
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1.5, T2 = 0.1, Ta = lO"'^ and M = 6000 for all three steps. The Monte Carlo 
decomposition was done in the full 14 dimensional space (using second-order 
Savitzky-Golay smoothing differentiation |47| with a 5th order polynomial and 
a 51 point window), thus 14 components were resolved. Then, we selected only 
6 components which had the largest average contributions to the mixed signals 
(this is similar to retaining only the components with largest eigenvalues in the 
PCA dimension reduction). These 6 most intense components were compared to 
the pure sources (Table On this data set, the estimates produced by SNICA 
are more accurate than those obtained by SIMPLISMA and BTEM without 
ALS and compare favorably with the results obtained by MILCA and ALS- 
versions of these algorithms. Notice that in order to produce accurate estimates 
SIMPLISMA needs ALS postprocessing corrections |2HIE1 (see also Fig. Ob). 
With these corrections, its performance seems to match closely with SNICA, 
although higher statistics simulations (Fig. ^p) suggest that this might not be 
typical. Also, SNICA resolved all 6 components equally well, whereas the other 
methods produced 2-3 relatively poorly estimated spectra (notably, acetone and 
aldehyde appeared to be the most problematic due to their highly overlapping 
spectra). 

4 Conclusions 

In this paper we have proposed a new stochastic technique, SNICA, for blind re- 
covery of non-negative well-grounded sources from their linear mixtures. Avoid- 
ing the PCA decorrelation as being counterproductive in certain cases, our 
algorithm is based on a Metropolis Monte Carlo constrained minimization of 
mutual information between recovered sources. The de-mixing transformation 
is obtained as a sequence of random shears and rotations that resolves the least 
dependent strictly non-negative components. 

The SNICA approach can be related to some other PCA-free ICA methods 
(for example, Infomax 0S| or TDSEP 0^1) in that it does not resort to the some- 
what arbitrary decorrelation constraint [HIIE] introduced by PCA. Instead, it 
is replaced in SNICA by the non-negativity constraint which reflects a natu- 
ral property of spectral signals in the present application. PCA simplifies and 
speeds up the decomposition considerably, but should be taken with caution. 
If the sources are strongly correlated, then PCA produces components which, 
in spite of being uncorrelated, are completely unphysical. If an ICA method 
(consisting of a rotation of prewhitened components) is used without ALS post- 
processing, it has no chance to undo the errors made by PCA prewhitening. For 
a specific example see Ref. 0. ALS postprocessing improves the situation, but 
only partially. 

Here we have tested SNICA in spectral curve resolution on a representative 
set of mixture problems and experimental data. Our results indicate that the al- 
gorithm is superior in performance to its PCA-based counterpart (MILCA, with 
the same mutual information estimator) including the ALS-assisted version. We 
also found our algorithm to outperform SIMPLISMA-I-ALS which represents a 
good reference point among the state-of-the-art curve resolution chemometrics 
techniques. The method presented in this paper performs blind decomposition, 
but the use of additional a priori information about pure components (such 
as, e.g., sparseness of spectral signals) might possibly improve the performance 
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further. In blind separation of multivariate time sequences, the perfomance of 
MILCA was greatly enhanced if also mutual informations at different time lags 
were included in the minimization In principle, the same can be done for 
SNICA, but we have not yet made any tests to see how practical this is. 

The assumptions of non-negativity and statistical independence of source 
signals are met in a variety of practical problems in analytical spectroscopy 
and beyond. The range of applications of blind non-negative spectral mixture 
decomposition spans almost all quantitative analytical approaches that rely on 
spectroscopic measurements and subsequent mixture analysis, including, e.g., 
hyperspectral remote imaging EHI , microscopy of nano-structures [H^ and 
biomedical samples |7l l55j . and separation of independent light sources in multi- 
frequency astrophysical observational data j56j . SNICA can be applied to all of 
them. 

The source codes of SNICA, MILCA and the MI estimator are freely available 
online at ,http : //www . f z- juelich . de/nic/cs/ software, 
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